Financial Time Series Models (ARCH/GARCH)

In a slight detour in this project, this section concerns fitting time series data with statistical models that concern the variance in the data. These models are called AutoRegressive Conditional Heteroskedasticity (ARCH) and Generalized AutoRegressive Conditional Heteroskedasticity (GARCH) models and are used when there is an autoregressive term in the error variance. They are typically used when modeling time-series data that exhibit volatility, which happens most often in financial time series. Therefore, for this section, a set of stock data is retrieved and analyzed using a combination of ARCH/GARCH and ARIMA.

NextEra Energy Inc

For this section, the stock of choice is NextEra Energy Inc, which is quite volatile and connected to the renewable energy as it has the largest market cap or net worth of 159.73 billion U.S. dollars among other renewable energy companies. After gathering the adjusted closing prices from Yahoo Finance, the following visualization below shows the price trend over 20+ years.

Stock Prices

Gather NextEra Energy stocks from Yahoo Fiance

NEE is the ticker symbol for NextEra Energy’s stock.

Code
NEE_ALL <- getSymbols("NEE",auto.assign = FALSE, from = "2019-01-01",to = "2023-04-13",src="yahoo")
NEE_ALL=data.frame(NEE_ALL)
NEE_ALL <- data.frame(NEE_ALL, rownames(NEE_ALL))
colnames(NEE_ALL)[7] = "Year"
NEE_ALL$Year <- as.Date(NEE_ALL$Year,"%Y-%m-%d")
knitr::kable(head(NEE_ALL,3))
NEE.Open NEE.High NEE.Low NEE.Close NEE.Volume NEE.Adjusted Year
2019-01-02 43.1700 43.3250 42.2525 42.4575 10549600 38.75114 2019-01-02
2019-01-03 42.4775 42.7900 42.1675 42.3525 9260800 38.65531 2019-01-03
2019-01-04 42.2875 43.1475 42.1650 43.1325 10848800 39.36721 2019-01-04

Visualize NEE Stock

Code
# plotly
# change font family and size
t <- list(
  family = "Palatino",
  size = 14)

theme_set(theme_gray(base_size=12,base_family="Palatino"))
fig <- NEE_ALL |> plot_ly(x = ~Year, type="candlestick",
          open = ~NEE.Open, close = ~NEE.Close,
          high = ~NEE.High, low = ~NEE.Low, name="NEE") 

fig <- fig |> add_lines(x = ~Year, y = ma(NEE_ALL$NEE.Adjusted, 20), name="20-MA", line = list(color = 'deeppink', width = 1), inherit = F)

fig <- fig |> add_lines(x = ~Year, y = ma(NEE_ALL$NEE.Adjusted, 80), name="80-MA", line = list(color = 'cyan3', width = 1), inherit = F)

fig <- fig |> layout(title = "Candlestick Chart of NextEra Energy Stock Prices",
              font=t, 
              legend = list(orientation = 'h', 
                            x = 0.5, 
                            y = 1,
                            xanchor = 'center', 
                            yref = 'paper', 
                            font = list(size = 10), 
                            bgcolor = 'transparent'),
              xaxis = list(
                type = 'date',
                tickformat = "%d %B (%a)<br>%Y"
                ),
              plot_bgcolor = "#eeeeee"
              )

fig
Code
p <- NEE_ALL |>
  ggplot() +
  geom_line(aes(y=NEE.Adjusted, x=Year),color="aquamarine3") +
  ggtitle("NextEra Energy Stock Pirces") +
  ylab("USD") +
  xlab("Month") +
  scale_x_date(date_breaks = "1 month",
                             labels = scales::date_format(format = "%Y-%m")) +
  # rotate labels for visibility
  theme(axis.text.x = element_text(angle = 270, vjust = 0.5, hjust=1))

ggplotly(p)

From the above figures, we can see the spikes of volatility in the price, especially around certain points where the price dramatically drops and rises (see around mid 2020). This is a good first indicator that there will be an ARCH/GARCH component when fitting the data. Another way to look at this data is by taking a look at price versus volume. This just gives us a better sense of the price and a different perspective, but it doesn’t necessarily provide us with more insight.

Price Volumn Chart

Code
NEE_ALL <- getSymbols("NEE",auto.assign = FALSE, from = "2019-01-01",to = "2023-04-13",src="yahoo")
chartSeries(NEE_ALL, theme = chartTheme("white"), # Theme
            bar.type = "hlc",  # High low close 
            up.col = "green",  # Up candle color
            dn.col = "red",
            name = "NEE Stock")   # Down candle color)

Again, the above figure shows us more information on the price trends versus the volume, but the overall conclusion is the same. However, the next figure shows the volatility better as the log of the returns is taken and plotted.

Code
returns <- diff(log(NEE_ALL$`NEE.Adjusted`), lag = 1)
returns %>% chartSeries(theme = chartTheme("white"), name = "NEE Stock Daily Log Returns")

The above figure shows the daily log of the returns, which gives a much better sense of the volatility of the returns for NextEra Energy. Here, since there is clear volatility, an ARCH/GARCH model will be needed to fit the data. However, before fitting an ARCH model immediately, it is important to check the ACF/PACF plots to check if an ARIMA model should be fit first before fitting an ARCH model on the residuals. First, since we took the log of the returns, we should plot the ACF to check if we should difference the data.

ACF and PACF Plots

Code
log(NEE_ALL$'NEE.Adjusted') |> ggAcf() +
  ggtitle("ACF of Log Transformed NextEra Energy Prices")

From the above figure, it is clear that a difference must be done to account for the correlation in the data. After differencing the data once, the ACF and PACF plots are created and shown below.

Code
acf <- ggAcf(returns, lag.max = 100) +
  ggtitle("ACF of Log Transformed and Differenced NextEra Energy Prices")
ggplotly(acf)
Code
pacf <- ggPacf(returns, lag.max = 100) +
  ggtitle("PACF of Log Transformed and Differenced NextEra Energy Prices")
ggplotly(pacf)

Seasonality Test

Code
isSeasonal(returns, test = "combined", freq = NA)
[1] FALSE

According to the seasonality test, NextEra stock price is not seasonal time series data.

Model Fitting - ARIMA Model

Now that the data is differenced, the ACF and PACF plots clearly show a need for an ARIMA model to fit the data. Once an ARIMA model is fit, there will probably be a need to fit an ARCH model on the residuals, but that will depend on the residuals. Let’s fit an ARIMA model first. From the above ACF and PACF plots, it would be appropriate to check all Arima models with p values ranging from 1-7 and q values ranging from 1-7. For the ARIMA model, d is set to 1 as we differenced the data once. From that, the following results were achieved.

Code
d = 1
i = 1
temp = data.frame()
ls = matrix(rep(NA,6*43),nrow=43)

for (p in 1:7)
{
  for(q in 1:7)
  {
    if(p + d + q <= 12)
    {
      model <- Arima(log(NEE_ALL$'NEE.Adjusted'),order=c(p,d,q), include.drift = TRUE)
      ls[i,] = c(p,d,q,model$aic,model$bic,model$aicc)
      i = i + 1
    }
  }
}

temp = as.data.frame(ls)
names(temp) = c("p","d","q","AIC","BIC","AICc")
Code
#knitr::kable(temp)
knitr::kable(temp[which.min(temp$AIC),])
p d q AIC BIC AICc
26 4 1 5 -5615.411 -5560.62 -5615.163
Code
knitr::kable(temp[which.min(temp$BIC),])
p d q AIC BIC AICc
9 2 1 2 -5607.54 -5577.654 -5607.462
Code
# auto.arima()
auto.arima(log(NEE_ALL$'NEE.Adjusted'))
Series: log(NEE_ALL$NEE.Adjusted) 
ARIMA(4,1,5) 

Coefficients:
          ar1     ar2      ar3      ar4     ma1      ma2     ma3     ma4
      -0.1793  0.7322  -0.3307  -0.7621  0.1423  -0.7033  0.3190  0.6553
s.e.   0.0753  0.0750   0.0547   0.0549  0.0799   0.0814  0.0658  0.0646
         ma5
      0.0980
s.e.  0.0365

sigma^2 = 0.0003135:  log likelihood = 2817.93
AIC=-5615.87   AICc=-5615.66   BIC=-5566.06

By sorting the table, the smallest AIC has the parameters of 4,1,5, while by sorting using the BIC criterion, the model has parameters of 2,1,2. Next, a function allows for an auto-selection of an ARIMA model. According to the auto.arima() method in R, the parameters for the model should be 2,1,2. Therefore, the 2,1,2 model makes the most sense.

Next, the model should undergo some diagnostics to make sure the fit looks good. Below are some basic diagnostics run using the 2,1,2 model.

Code
sarimafit <- sarima(log(NEE_ALL$'NEE.Adjusted'), 2, 1, 2, details = FALSE)

Given the figure above, the following observations can be made. First, when looking at the plot of standardized residuals, the mean should be around 0, with the variance at about 1. As seen above, this is generally true, with the mean close to 0. However, the variance is probably slightly more than 1 and there is clusters of higher variance, which indicates a second model should be fit for the errors. Generally, if the mean or variance is significantly different than expected, this would be a sign of a poor model fit, but we don’t really see that here. Next, when inspecting the ACF of residuals, the plot shows no significant lags, which is a very encouraging sign considering the logic above. Next, when looking at the qq-plot, the hope is to see some signs of normality. Here, we see some signs of normality in the data, but with a bit of skew. Finally, when looking at the p-values for the Ljung-Box test, many of the p-values are above 0.05, which is a decent sign of a good model.

Next, we can do an initial check on the absolute returns and squared returns to check if an ARCH model is needed. Below is a set of visualizations for the ACF and PACF plots for the absolute and squared returns.

Code
fit <- arima(log(NEE_ALL$'NEE.Adjusted'), order = c(2, 1, 2))
res.arima <- fit$res
squared.res.arima <- res.arima^2
acf_residuals <- ggAcf(squared.res.arima, na.action = na.pass) + ggtitle("ACF Squared Residuals")
ggplotly(acf_residuals)
Code
fit <- arima(log(NEE_ALL$'NEE.Adjusted'), order = c(2, 1, 2))
res.arima <- fit$res
squared.res.arima <- res.arima^2
pacf_residuals <- ggPacf(squared.res.arima, na.action = na.pass) + ggtitle("ACF Squared Residuals")
ggplotly(pacf_residuals)

From the above plots and the previous diagnostics, it is pretty clear an ARCH model is needed to fit the residuals of the ARIMA model. Using the above plots, the p-value should range between 1-9. Then, using a similar technique to fitting the ARIMA model, a set of ARCH models can be created and evaluated to find the best fit for the residuals. Below is another table that shows the fitted ARCH models with their associated AICs.

Model Fitting - ARIMA(2,1,2) + ARCH Model

Code
ARCH <- list() ## set counter
cc <- 1
for (p in 1:9) {
  ARCH[[cc]] <- garch(res.arima, order = c(0, p), trace = F)
  cc <- cc + 1
}

## get AIC values for model evaluation
ARCH_AIC <- sapply(ARCH, AIC) ## model with lowest AIC is the best
min(ARCH_AIC)
[1] -5907.563
Code
which(ARCH_AIC == min(ARCH_AIC))
[1] 6
Code
ARCH[[which(ARCH_AIC == min(ARCH_AIC))]]

Call:
garch(x = res.arima, order = c(0, p), trace = F)

Coefficient(s):
       a0         a1         a2         a3         a4         a5         a6  
0.0000901  0.1575585  0.1380603  0.1605526  0.0433259  0.0436153  0.1958591  
Code
arch.df <- data.frame(p = 1:9, q = 0, AIC = ARCH_AIC)
knitr::kable(arch.df)
p q AIC
1 0 -5749.417
2 0 -5818.602
3 0 -5886.832
4 0 -5882.985
5 0 -5881.615
6 0 -5907.563
7 0 -5905.209
8 0 -5887.420
9 0 -5886.691

By sorting the table, the smallest AIC has the parameters of 6,0. Therefore, we should fit the residuals with a 6,0 ARCH model. Now that we have fit the data with an ARCH/ARIMA model, the next step is to check the fit. One option is to check the residuals. First, we can plot the residuals and evaluate the results. Additionally, we can check whether the residuals are normal.

Code
fit2 <- garch(res.arima, order = c(0,6), trace = FALSE)
checkresiduals(fit2)


    Ljung-Box test

data:  Residuals
Q* = 15.323, df = 10, p-value = 0.1207

Model df: 0.   Total lags used: 10
Code
qqnorm(fit2$residuals, pch = 1)
qqline(fit2$residuals, col = "blue", lwd = 2)

Code
Box.test(fit2$residuals, type = "Ljung")

    Box-Ljung test

data:  fit2$residuals
X-squared = 0.22758, df = 1, p-value = 0.6333

From above, we can see a couple of important diagnostic plots. The first set of plots show a set of 3 plots of the residuals, the ACF of the residuals, and a distribution of the residuals. The residuals seem to now have a constant mean and variance according to the first plot. Next, the ACF shows strong signs of stationarity in the residuals. Finally, the distribution of the residuals is relatively normal. This can be confirmed using the second tab, which is a qqnorm plot of the residuals. Using this plot, the normality of the residuals can be confirmed. These plots all point towards strong signs the model is a good fit for the data. Finally, a Ljung-Box test outputs a p-value greater than 0.05, indicating that the residuals are independent, encouraging the suitable model. Here, after doing the model diagnostics, the model above can be used for forecasting, and the model is ARCH(6,0) + ARIMA(2,1,2).

Forecast

Code
fit3 = garchFit(~ arma(2,2) + garch(6, 0), data = res.arima, trace = FALSE)
predict(fit3, n.ahead = 5, plot=TRUE)

meanForecast meanError standardDeviation lowerInterval upperInterval
0.0020155 0.0103173 0.0103173 -0.0182060 0.0222370
0.0015082 0.0121532 0.0121429 -0.0223117 0.0253281
0.0015581 0.0116165 0.0115736 -0.0212099 0.0243261
0.0016771 0.0123125 0.0122627 -0.0224550 0.0258092
0.0016808 0.0124056 0.0123578 -0.0226337 0.0259953

Using this model, the data can be effectively modeled and perhaps future values can be forecasted. However, stock market prices are generally impossible to forecast. We can say this model does a decent job at modeling the data and its variance.